Skip to content

fix(render): correctly escape non-BMP Unicode in AsciiJSON#4693

Open
xd-sarthak wants to merge 1 commit into
gin-gonic:masterfrom
xd-sarthak:fix-asciijson-nonbmp
Open

fix(render): correctly escape non-BMP Unicode in AsciiJSON#4693
xd-sarthak wants to merge 1 commit into
gin-gonic:masterfrom
xd-sarthak:fix-asciijson-nonbmp

Conversation

@xd-sarthak
Copy link
Copy Markdown

@xd-sarthak xd-sarthak commented Jun 3, 2026

Description

AsciiJSON silently corrupts Unicode code points above U+FFFF (emoji, CJK Extension B, etc.). The escape loop formats every non-ASCII rune with "\u%04x", which only produces a valid escape for the Basic Multilingual Plane (U+0000-U+FFFF). For a rune like the grinning face (U+1F600) it emits six hex digits:

{"msg":"\u1f600"}

A JSON parser reads \u as exactly four hex digits, so it consumes \u1f60 and treats the trailing 0 as a literal character. The result is a valid-but-wrong string instead of the original code point.

Per RFC 8259 section 7, code points above U+FFFF must be written as a UTF-16 surrogate pair (two \uXXXX escapes). This PR detects r > 0xFFFF and emits the pair via the standard library unicode/utf16.EncodeRune:

{"msg":"\ud83d\ude00"}

which round-trips back to the original character. ASCII and BMP paths are unchanged.

Fixes #4688

Changes

  • render/json.go: escape non-BMP runes as a UTF-16 surrogate pair.
  • render/ascii_nonbmp_test.go: regression test asserting AsciiJSON output is ASCII-only and round-trips back to the original value.

Notes

  • No public API change.
  • Behavior for ASCII (U+0000-U+007F) and BMP (U+0080-U+FFFF) is byte-for-byte identical to before.
  • This changes the bytes emitted for non-BMP input from a malformed single \uXXXX escape to a correct surrogate pair. The previous output was invalid for these inputs, so no correct consumer could have depended on it.

Checklist

  • PR opened against master.
  • Tests pass locally (go test ./render/); gofmt + go vet clean.
  • Test added covering the change.
  • No new feature, so docs/doc.md not applicable.

AsciiJSON escaped every non-ASCII rune with "\u%04x", which only yields a
valid escape for the Basic Multilingual Plane (U+0000-U+FFFF). For a code
point above U+FFFF such as U+1F600 it emitted six hex digits ("ὠ0").
A JSON parser reads \u as exactly four hex digits, so this decoded to "ὠ0"
instead of "😀".

Per RFC 8259, code points above U+FFFF must be encoded as a UTF-16
surrogate pair (two \uXXXX escapes). Detect r > 0xFFFF and emit the pair via
unicode/utf16.EncodeRune. ASCII and BMP output is unchanged.

Add a regression test asserting AsciiJSON output is ASCII-only and
round-trips back to the original value.

Fixes gin-gonic#4688

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.38%. Comparing base (3dc1cd6) to head (25173ee).
⚠️ Report is 281 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4693      +/-   ##
==========================================
- Coverage   99.21%   98.38%   -0.83%     
==========================================
  Files          42       48       +6     
  Lines        3182     3164      -18     
==========================================
- Hits         3157     3113      -44     
- Misses         17       42      +25     
- Partials        8        9       +1     
Flag Coverage Δ
?
--ldflags="-checklinkname=0" -tags sonic 98.37% <100.00%> (?)
-tags go_json 98.31% <100.00%> (?)
-tags nomsgpack 98.36% <100.00%> (?)
go-1.18 ?
go-1.19 ?
go-1.20 ?
go-1.21 ?
go-1.25 98.38% <100.00%> (?)
go-1.26 98.38% <100.00%> (?)
macos-latest 98.38% <100.00%> (-0.83%) ⬇️
ubuntu-latest 98.38% <100.00%> (-0.83%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AsciiJSON silently corrupts non-BMP characters (emoji) by emitting malformed \u escapes

1 participant